Abstract:Diffusion-based large multimodal models, such as LLaDA-V, have demonstrated impressive capabilities in vision-language understanding and generation. However, their bidirectional attention mechanism and diffusion-style iterative denoising paradigm introduce significant computational overhead, as visual tokens are repeatedly processed across all layers and denoising steps. In this work, we conduct an in-depth attention analysis and reveal that, unlike autoregressive decoders, LLaDA-V aggregates cross-modal information predominantly in middle-to-late layers, leading to delayed semantic alignment. Motivated by this observation, we propose a structured token pruning strategy inspired by FastV, selectively removing a proportion of visual tokens at designated layers to reduce FLOPs while preserving critical semantic information. To the best of our knowledge, this is the first work to investigate structured token pruning in diffusion-based large multimodal models. Unlike FastV, which focuses on shallow-layer pruning, our method targets the middle-to-late layers of the first denoising step to align with LLaDA-V's delayed attention aggregation to maintain output quality, and the first-step pruning strategy reduces the computation across all subsequent steps. Our framework provides an empirical basis for efficient LLaDA-V inference and highlights the potential of vision-aware pruning in diffusion-based multimodal models. Across multiple benchmarks, our best configuration reduces computational cost by up to 65% while preserving an average of 95% task performance.




Abstract:Intelligent reflecting surfaces (IRSs) have emerged as a transformative technology for wireless networks by improving coverage, capacity, and energy efficiency through intelligent manipulation of wireless propagation environments. This paper provides a comprehensive study on the deployment and coordination of IRSs for wireless networks. By addressing both single- and multi-reflection IRS architectures, we examine their deployment strategies across diverse scenarios, including point-to-point, point-to-multipoint, and point-to-area setups. For the single-reflection case, we highlight the trade-offs between passive and active IRS architectures in terms of beamforming gain, coverage extension, and spatial multiplexing. For the multi-reflection case, we discuss practical strategies to optimize IRS deployment and element allocation, balancing cooperative beamforming gains and path loss. The paper further discusses practical challenges in IRS implementation, including environmental conditions, system compatibility, and hardware limitations. Numerical results and field tests validate the effectiveness of IRS-aided wireless networks and demonstrate their capacity and coverage improvements. Lastly, promising research directions, including movable IRSs, near-field deployments, and network-level optimization, are outlined to guide future investigations.




Abstract:In the field of Internet of Things, there is an urgent need for sensors with large-scale sensing capability for scenarios such as intelligent monitoring of production lines and urban infrastructure. Brillouin optical time domain analysis (BOTDA) sensors, which can monitor thousands of continuous points simultaneously, show great advantages in these applications. We propose a convolutional neural network (CNN) to process the data of conventional Brillouin optical time domain analysis (BOTDA) sensors, which achieves unprecedented performance improvement that allows to directly retrieve higher spatial resolution (SR) from the sensing system that use long pump pulses. By using the simulated Brillouin gain spectrums (BGSs) as the CNN input and the corresponding high SR BFS as the output target, the trained CNN is able to obtain a SR higher than the theoretical value determined by the pump pulse width. In the experiment, the CNN accurately retrieves 0.5-m hotspots from the measured BGS with pump pulses from 20 to 50 ns, and the acquired BFS is in great agreement with 45/40 ns differential pulse-width pair (DPP) measurement results. Compared with the DPP technique, the proposed CNN demonstrates a 2-fold improvement in BFS uncertainty with only half the measurement time. In addition, by changing the training datasets, the proposed CNN can obtain tunable high SR retrieval based on conventional BOTDA sensors that use long pulses without any requirement of hardware modifications. The proposed data post-processing approach paves the way to enable novel high spatial resolution BOTDA sensors, which brings substantial improvement over the state-of-the-art techniques in terms of system complexity, measurement time and reliability, etc.